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I  INTRODUCTION 


The  qoal  of  the  VISIONS  project  is  to  develop  a  system  that 
can  interpret  static  color  images  of  outdoor  scenes.  LHAN'/8a,bJ 
The  interpretation  task  consists  of  labeling  the  various  objects  in 
an  image  and  describing  the  relationships  among  them.  This  task  is 
difficult,  given  the  complexity  and  variety  inherent  in  the  domain. 
The  set  of  objects  and  possible-  relations  is  large,  lighting 
varies,  exact  camera  models  are  often  not  available,  shadows  and 
occlusion  obscure  the  shapes  of  objects,  and  seasonal  changes 
introduce  spectral  and  textural  variety.  A  great  deal  of  knowledge 
must  be  brought  to  bear  in  understanding  images  of  outdoor  scenes. 

A  large  part  of  this  knowledge  concerns  the  set  of  objects 
that  can  and  do  appear  in  there  images  and  the  possible  and 
probable  relations  among  them  In  order  to  understand  the  images, 
detailed  information  about  the  distinguishing  characteristics  of 
each  object  class  must  also  be  available.  Ihis  paper  presents 
preliminary  results  showinq  how  various  strategies  utilizing  four 
types  of  simple  f natures — size,  shape,  color,  and  location — can  be 
used  to  recognize  objects  and  form  the  basis  of  a  simple 
interpretation  system 


Che  size  of  an  object  can  aid  in  its  recognition.  However,  in 
images  absolute  sizes  are  rarely  available  and  furthermore  members 
of  an  object  class  often  appear  in  a  varVg-e  of  sizes  Thus  size  is 
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most  important  in  relative  terms  objects  can  be  recognized  using 
the  sizes  oP  reference  objects  that  have  been  located  in  the  image. 

Characteristics  of  an  object'*;  <y-l.))  shape  provide  recognition 
cues.  Curvature.  compactness.  h e i g h t- to~wi d th  vetio. 

rec tang u lar i ty —  these  are  a  few  shape  features  that  may  help  ro 
distinguish  objects  For  example,  certain  man-made  objects 

(windows,  doors,  shutters)  exhibit  high  rectangularity,  little 
curvature.  and  a  vertical  orientation  (greater  height  than  width) 
However,  representing  and  recovering  complex  shape  characteristics 
is  very  difficult.  CY0R81]  CKEN8O.1 

Color  or  spectral  features  are  especially  useful  in 
identification,  particularly  for  "natural"  objects  such  as  grass, 
sky,  and  foliage  whose  color  tends  to  be  more  predictable  than  that 
of  man-made  oojects  such  as  cars  and  houses  Spectral  features 
include  the  red,  green,  and  blue  components  of  an  image  element's 
intensity,  color  transforms,  and  simple  texture  measures.  Sets  of 
these  features  can  be  used  to  charai  terne  different  objects.  Sky, 
for  example,  tends  to  have  a  high  blue  component  value  but  a  low 
saturation  value  Foliaqe,  on  the  other  hand,  tends  to  be  quite 
saturated  In  this  case,  saturation  is  used  in  recognizing  foliage 
because  it  not  only  CHARACTERIZES  ,jh  aspect  of  all  foliage  but  also 
DISCRIMINATES  foliage  from  other  objects 


Location  plays  a  part  in  objeri  recognition 


I h  e  location  of 
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certain  objects  can  often  be  predicted.  sky  often  appears  at  the 
top  of  an  image;  grass,  road;  or  ground  often  appear  at  the 
bottom  As  was  the  case  with  size,  these  object  location  features 
can  be  used  not  inly  to  identify  possible  object  classes  but  also 
to  eliminate  other  object  classes  from  consideration.  Location  can 
also  be  characterized  in  relative  terms/  providing  identification 
information  via  expected  spatial  relations  among  objects.  In 
general/  the  information  characterized  by  the  four  types  of 
features —  size/  shape,  color,  location —  is  important  not  only  in 
absolute  but  also  in  relative  terms  (in  the  form  of  relations). 
Objects  are  often  identified  using  other  objects  as  references. 
This  observation  implies  that  object  recognition  can  be  carried  out 
in  at  least  two  ways;  simply  by  listing  the  expected  feature 
values  of  an  object  class  and  searching  for  a  match  (a  local 
approach)  OR  within  a  context  usinq  some  kind  of  strategy  that 
operates  on  the  feature  values  (a  global  approach).  Matching  alone 
does  not  seem  to  be  sufficient  for  most  recognition  tasks.  Thus  it 
is  clear  that  the  process  of  object  identification  should  consist 
of  a  variety  of  strategies  operating  on  the  types  of  feature 
information  outlined  above 

The  experiments  presented  below  demonstrate  the  utility  of 
various  strategies  operating  on  (: nature  information  in  developing 
an  image  interpretation  The  strategies  are  simple.  each  can  be 
'fooled"  in  certain  cases  Used  together,  however,  they  provide  a 
fairly  robust  foundation  for  a  first  pass  interpretation  system. 
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II  EXPERIMENTS 

The  four  128  x  128  images  of  house  scenes  used  in  the 
experiments  are  shown  in  Figure  1  Three  si  the  images  are 
different  views  of  the  same  house.  (he  information  gathered  during 
the  i n terp re ta t i on  of  one  of  these  three  images  could  be  used  to 
guide  the  interpretation  of  either  or  both  of  the  other  images, 
assuming  similar  or  identical  lighting  conditions.  Such  an 
approach  would  be  especially  useful  in  motion  processing.  In  the 
experiments  presented  below,  however,  interpretation  strategies 
have  been  applied  independently  to  each  of  the  images. 

The  domain  of  house  scenes  i  fairly  complex,  yet  it  l  :• 
manageable  in  that  the  set  of  commonly  occurring  object  types  is 
not  too  large  (less  than  20),  and  there  are  a  variety  of  structural 
and  relational  constraints  that  can  be  exploited  in  object 
recognition.  For  example,  with  many  houses,  windows  are 
constrained  to  be  located  between  two  shutters.  This  type  of 
constraint  generates  predictions  about  the  existence  and  location 
of  certain  objects  based  upon  a  partial  interpretation  and  can  be 
incorporated  into  strategies  for  both  hypothesis  formation  and 
hypothesis  verification 

Certain  assumptions  have  been  made  in  the  experiments.  The 
system  assumes  a  camera  position  that  is  approximately  level  so 
that  the  horizon  is  expected  to  be  near  the  center  of  the  linage. 
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Tno  four  12'1>12G  itnatiOH  used  in  the  expo  r  i  monts .  Label 
them  1-4  starting  in  the  Upper  left  and  proceed  inn  clock¬ 
wise.  Imp’  1  is  used  in  most  of  the  other  fi  euros . 
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this  assumption  allows  the  system  to  predict  the  extents  of  sky  and 
ground  regions.  The  second  assumption  is  that  the  spectral 
attributes  of  objects  are  fairly  typical;  for  example/  grass  is 
green  rather  than  brown.  This  assumption  allows  reasonable 
hypotheses  of  object  identities  to  be  developed  using  expected 
spectral  attributes  of  objects.  Finally/  the  system  assumes  that  a 
good  segmentation  has  been  provided  for  establishing  a 
correspondence  between  regions  and  object  surfaces. 

Ill  I  MAGE- 1 NDEPENDENT  SPECJJR  AJ  ATTRIBUTE  MATCHING 

Spectral  attributes  can  be  used  to  characterize  certain 
"natural"  objects — bush/  grass/  i  ky(  tree — whose  features  are 
fairly  predictable.  There  are  also  certain  classes  of  man-made 
objects  whose  color  and  texture  are  predictable,  such  as  roads, 
sidewalks,  fire  hydrants,  and  stop  signs.  The  simplest  use  of 
color  and  texture  attributes  consists  of  matching  the  expected 
feature  values  of  an  object  class  with  those  of  image  regions  to 
form  hypotheses  of  ohject  identity  The  technique  of  object  to 
region  matching  of  attributes  that  is  presented  below  has  been  used 
previously  and  is  described  only  briefly  here  (See  CWIL813  ) 

Given  a  set  of  features  and  sot  of  training  images  of  outdoor 
scenes.  the  mean,  standard  deviation,  maximum  value,  and  minimum 
value  of  each  feature  were  computed  using  hand-selected  regions 
known  to  represent  the  "natural"  objects  mentioned  above.  These 
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statistics  mere  used  to  form  prototype  templates  of  the  ranges  of 
feature  values  for  each  object  class.  The  matching  process 
consists  of  forming  a  confidence  by  comparing  the  feature  values  of 
a  region  to  the  feature  values  of  each  of  the  templates.  The 
confidence  value  obtained  symbolize',  a  hypothesis  that  a  certain 
region  represents  a  certain  object  or  object  part.  Maximum 
confidence  is  assigned  to  a  region  whose  mean  feature  value  is 
within  one  standard  deviation  of  the  expected  mean  for  an  object 
The  confidence  decreases  linearly  to  zero  at  the  minimum  and 
max imum  values. 

The  results  of  spectral  attribute  matching  in  the  four  images 
are  summarized  in  Table  l  life.  grassi  and  sky  regions  are 

identified  fairly  accurately  Bush  regions  were  most  often 

mi  sc  las s  i  f i ed  as  tree,  accounting  for  six  of  the  eight  bush  regions 
incorrectly  labeled  and  generating  six  false  alarms  for  tree.  It 
is  not  unreasonable  for  a  system  to  make  errors  between  different 
classes  of  foliage  when  the  classification  is  based  purely  on  local 
features  Grouping  tree  and  bush  under  a  category  of  "foliage" 
produces  better  results.  with  Z'J  of  26  target  regions  being 
correctly  identified  The  portions  of  the  image  rhat  are  correctly 
labeled  are  shown  in  white  in  Figure 

Because  this  matching  strategy  only  deals  with  a  restricted 
subset  of  the  objects  commonly  occurring  in  outdoor  scenes,  regions 
representing  objects  not  :n  the  subset  are  always  labeled 
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incorrectly.  In  many  cases  the  confidence  avsigued  t,  o  this 
erroneous  label  mq  is  sufficiently  J  ou»  'compared  to  expected 
values)  that  the  labeling  can  he  rejected.  in  other  cases  the 
confidence  is  relatively  high  and  a  labeling  error  results  For 
example.  in  the  images  of  Figu-e  J.  the  white  house  walls 
"acquired"  many  of  the  spectral  characteristics  of  sky  and  hence 
are  often  interpreted  as  sky.  In  cases  sue  h  as  this  it  is 
unreasonable  to  expect  the  system  to  distinguish  between  high  match 
vie  non-target  regions  whose  t  nlor  and  texture  attributes  are 
similar  to  those  of  the  target  objcit  prototypes  and  actual  target 
regions  While  it  is  possible  that  better  results  could  be 
achieved  by  formulating  the  target  vs.  non-tar  qet  problem  as  a 
classical  statistical  hypothesis  toitinq  problem,  it  is  conjectured 
that  many  of  the  erroneous  labels  may  be  eliminated  by  the 
application  of  labeling  constraints  derived  from  the  relationships 
between  o  b  )  r  c  t  s  and  the  structural  properties  of  objects  appearing 
in  the  scene  experiments  described  later  arc:  a  first  attempt  to 
show  how  this  may  be  accomplished 

Spectral  attribui  matching  i  *.  computationally  inexpensive  if 
the  object  training  data  saw  hern  analyzed  previously  If  one 
ignores  the  errors  involving  c  c  r  >'  u  ».  >  on  of  foliage  categories  and 
the  problems  of  high  match  v.a  1  m-  non-target  regions-  then  the 
approach  has  yielded  excellent  r  e  s  u  '  '  •»  It.  might  be  made  still 
more  powerful  in  several  ways  Collecting  object  attributes  across 
a  larger  set  of  images  might  strengthen  the  predictive  abilities  of 
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the  prototype  templates  On  the  other  hand>  further  data 
collection  might  pollute  the  statistics  already  computed.  In  this 
case  it  might  be  necessary  to  add  new  object  sub -classes  such  as 
"tree-in-winter"  and  " tree-i n- sp r 1 ny "  Adding  new  features  and 
readjusting  the  importance  of  each  feature  used  in  matching  miqht 
improve  the  prototype  templates'  charac ter i za ti ons  of  object 
classes  and  thus  yield  a  better  labeling  performance 

Spectral  attribute  matching,  .»'■  it  is  currently  implemented, 
can  often  provide  an  accurate  initial  set  of  hypotheses  upon  which 
to  base  the  rest  of  the  i nterpr et a f > on. 

I  I  2  IMAGE-DEPENDENT  ATTRIBUTE  MATCHING  VIA  OBJECT  EXEMPLARS 


The  process  of  matching  spectral  attributes  described  in  the 
previous  section  involves  a  comparison  of  feature  values  of  regions 
to  image-independent  feature  values  of  object  prototypes  Another 
approach  that  might  prove  more  robust  and  context-sensitive  is  the 
use  of  a  partial  interpretation  of  the  image.  Assuming  a  region  in 
an  image  has  bier  identified  an  a  particular  object  using  some 
interpretation  strateqy,  the  feature  values  of  that  region  can 
serve  as  an  image-specific  object  template.  These  feature  values 
can  be  used  in  finding  similar  regions  that  most  likely  represent 
instances  of  the  same  object  class,  using  the  same  matching  process 
described  in  Section  II.  1 
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Consider  the  example  in  Figure  3.  Suppose  shutter  regions 
have  been  identified  using  shape  Chirac  ter- 1  s  1 1  c  s  Knowledge  about 
the  structure  of  houses  suggests  that  regions  of  significant  sue 
that  surround  the  shutter  regions  will  represent  house  wall  or 
windows  Figure  3a  shows  the  identified  shutter  regions  and  1 -pure 
3b  the  neighboring  regions  hypothesized  to  represent  house  wall  or 
window  Here  region  neighbors  are  strictly  adjacent;  this 
requirement  could  be  relaxed  so  that  nearby  regions  that  are  not 
strictly  adjacent  would  be  included  A  house  wail  template  region 
was  selected  from  among  these  regions  by  searching  for  the  first 
region  that  was  larger  than  a  minimum  size  and  had  greater  than  a 
minimum  value  on  a  color  transform  feature  The  "G"  value  of  the 
YIQ  television  color  transform  was  used  because  house  regions  had 
consistent  values  on  this  feature  across  several  images  Other- 
features  such  as  intensity  and  simple  texture  measures  were  not  as 
useful  in  this  respect 

The  house  wall  template  region  was  used  m  a  matching  process 
in  attempting  to  identify  other  house  regions  Utilizing  the  level 
camera  assumption,  the  knowledge  ihat  house  wall  regions  will 
appear  in  a  horizontal  band  of  the  image  can  be  used  to  constrain 
the  processing.  Matching  was  restricted  to  those  regions  that 
overlapped  a  horizontal  band  defined  by  the  upper  and  lower  extents 
of  the  template  region.  This  simple  spatial  constraint  limits  the 
matching  and  reduces  the  number  of  false  alarms  that  would 


otherwise  occur. 


‘  i' -I  ; i i w f  t“ f 1 1  ti'Mioris.  (b)  House  w.i  1  1  surround 
llou'-.i  template  in  white  with  mat  chi nq  reqion 
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the  strategy  labels  some  region*:-  incorrectly  In  1  igure  3c 

the  1 l qh t -c o 1 ored  region  is  the  selected  template  and  the  slightly 
darker  regions  are  those  that  mate  lied  Note  the  errors  in  the  sky 
region  and  tree  highlight  region*..  the  houses  in  the  images  are 
white;  they  tend  to  exhibit  characteristics  of  the  incident 
illumination  Highlights  are  smooth  surface  reflections  and  hence 
also  exhibit  c h ar ac t er l s 1 1  c s  of  incident  illumination.  The 

strategy  also  fails  to  identify  all  house  wa  1  1  regions/ 
particularly  those  regions  that  represent  shadowed  house  wail  In 

this  case/  interna]  contrast  tends  to  be  lower/  affecting  the 
texture  measures  used,  and  the  spectral  components  are  distributed 
over  lower  ranges,  resulting  in  a  poor  match  between  these  feature 
values  and  those  of  the  selected  exemplar.  Roth  of  these  kinds  of 
errors  are  reasonable  given  the  overall  goal:,  of  the  approach  the 
formation  of  label  hypotheses  based  on  a  loose  notion  of  feature 
s  i  m  i  lantg 

As  is  the  case  with  many  of  these  simplified  strategies,  there 
are  many  plausible  ways  for  achieving  improvement  in  performance 
The  process  might  be  made  more  powerful  by  incorporating  stricter 
spatial  constraints  based  on  world  knowledge  for  example, 

matching  might  be  restricted  to  those  regions  strictly  adjacent  to 
the  t.  ?<no  i  ■»  t  e  region  or  to  those  r  onions  whose  centroids  lie  within 
the  »i,;m  liontal  band  defined  fy  the  template  region's  upper  and 
lower  extents  Also,  the  featereo  **,ed  m  match  i  ng  can  be  tailored 
to  the  o h  ; ?  i  type  being  identified  I  inding  these  characteristic 
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Matures  involves  studying  the  consistencies  of  appearance  of  an 
object:  across  many  mages.  (Jnly  tin-  features  which  tend  to  be 
invariant  for  an  object  class  would  be  used  in  matching,  thereby 
reducing  the  cost  and  hopefully  producing  better  results 

While  image-sp ec  l  f 1 c  region  tcmpiating  avoids  some  of  the 
problems  faced  in  usinq  an  image 'independent  attribute  matcher 
(  e  g,  lighting  variations).  the  i hoice  of  a  template  remains 
crucial  and  is  dependent  upon  the  power  and  variety  of  the  other 
ir'.erpretation  strategies  Lor  example,  the  house  wall  templating 
strategy  described  above  depends  directly  on  a  strategy  for 
locating  shutters  Within  the  general  structure  of  VISIONS, 
strategies  are  applied  and  interpreted  in  an  environment  of 
cooperation  and  competition  among  the  various  hypotheses  developed 
L'HAN78bJ  CPAR807  CWIL773;  labeling  conflicts  arising  from  the 
partial  evidence  available  to  each  strategy  are  resolved  in  the 
context  of  more  global  information  Thus,  although  the  region 
hemplating  strategy  is  dependent  upon  correct  identification  of 
some  of  the  imaqe,  it  still  server,  as  a  powerful  mechanism  for 
extending  a  partial  .•  o  t  e  r p  r  e  t  a  t  i  o  n 

I  r  -!  SKY  /GROUND  FILTERING 

The  techniques  described  so  in  have  re  lied  on  color  features 
alone  in  attempting  to  label  the-  regions  m  an  image  A  strategy 
that  incorporates  the  expected  locations  of  two  ob  j  ec.  ts  —  s  k  y  and 


ground — can  be  used  to  eliminate  or  "filter  out"  erroneous 
hypotheses  or  to  reduce  conflicts  between  labels  generated  bu 
separate  processes. 

In  order  to  implement  this  stvateqy  a  sky  temp  late  region  md 
a  grass  template  region  must  be  selected  I  he-  sky  template  is. 
chosen  based  on  size,  color/  and  location  near  the  top  of  the 
image  The  grass  template  is  chosen  based  on  color  and  location 
near  the  bottom  of  the  image  The  epatial  extents  of  these  regions 
are  used  to  mark  the  probable  lower  limit  of  sky  and  the  upper 
limit  of  ground  Figure  4  shows  the  sky  line  arid  ground  line 
selected 

These  two  lines  provide  a  rough  appro x imat i on  to  the  location 
of  the  horizon  in  the  image.  This  information  is  used  to  filter 
the  results  of  spectral  attribute  matching  For  example/  a  region 
hypothesized  to  represent  grass  that,  appears  above  the  sky  line 
would  have  to  be  relabeled  This  relabeling  is  accomplished  bu 
settinq  the  confidence  value  for  qv./ss  to  the  lowest  possible  value 
of  -99  99.  By  doing  this  the  next  highest,  confidence  value  becomes 
the  highest/  and  the  region  has  a  new  object  label 

The  filtering  process  is  helpful  but/  like  region  temp  la ting, 
is  dependent  upon  careful  selection  of  the  sky  arid  grass  template 
regions  The  selection  of  a  low  sku  line  or-  a  high  ground  line 
does  not  provide  much  information  hut  neither  does  it  cause 


mm**'** 


accurately  labeled  regions  to  be  v<  labeled  incorrectly  On  the 
other  hand,  a  high  sku  line  ->r  ,  i  low  ground  line  imposes  strict 
-  nn  •:  »  ram  »  <>  on  the  region  lab'  1  s  and  can  can  .  <•■  the  I  1  i  ter  inq 
pr  •••:•?-.  a  •  o  eliminate  c  or  r  <=••.  t  label  ■  mri 

il  e  *. T  ?  i  strategies  tor  template'  selection  might  eliminate  this 
problem  Having  a  model  of  the  , amera  would  provide  the  actual 
lcc-i'ticn  of  the  hori  :on  and  furnish  more  accurate  information  about 
the  actual  extents  of  sky  and  ground  finally/  the  groundp  lane  can 
sometime?  oe  approximately  located  bn  searching  for  the  bottom 
edges  of  vertically  oriented  siirlai  i  i, 

y 1  4  RCCTANGl  f  RINDING 

He  c  t  angularity  i «  a  shape  f  r.-.turo  that  characterizes  many 
man-  made  ob  iects.  Poors.  window  .  and  shutters  that  appear  in  a 
house  image  are  usually  rectangular  or  nearly  so  l  ven  rectangular 
oc  jects  that  have  been  -  ot e  h  n  v  tened  bg  the  camera  angle  can  be 
identified  bg  )c  ihnq  regions  of  high  rec  tangu  la  r  i.  tg  in  the  linage 
•  h  a  :•  a  ’■ » gi  on  s  car.  bo  i'C  if  led  by  apply  in  g  a  function  that  checks 
virf i  region's  deviation  trim  rer  lamiular itu  and  saves  those  r eq ions 
t  r>  a '  >'jr  i  v  e  a  threshold  the  deviation  is  a  per »  entag  «*  calculated 
•1  3  r  O  ■  I  i'  •- 

1  u  r i  o  •  cm  fusing  rectangle  actual  area) 

v’.  mi  or  --  t  ( >0  0*  - .  -  - -  - - - - — 

a,  on  of  enclosing  rci tangle 


Figure  6.  Identified  shutter  regions 


Figure  5 


Regions  with  deviation  from 
rectangularity  of  £  25%. 


Figure  7.  Regions  with  height-to-width 
ratios  >  5. 
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Figure  5  shows  those  imaqe  reqiori?  i  hat  survived  a  threshold  of 
25'. 


Adding  h  e  1  q  h  t—  t  o~w  1  d  t  h  ratio  and  s  i  t  e  constraints  to  the 
rectangle  finding  strategy  r  e  s  u  T  <  ••  in  a  shutter  identification 
procedure  1  lqurr  6  shuuis  those  i in,  ie  regions  that  were  labeled  as 
shutters  these  regions  have  a  h  <>  •  <ih  t-  to-wi  d  th  ratio  greater  than 

5.  a  deviation  from  rectangular  l  tg  !  no  more  than  25%,  and  an  area 
of  at  least  20  pixels  (assumes  a  ■  ertain  scale)  l-iqure  /  shows 
i«ine  regions  that  were  selected  t.u  »d  on  the  height-  to— width  ratio 
c  o ~  -  t  r  a  i  n  t  a  1  one. 

The  parameters  for  the  shu'.tei  identification  procedure  were 
sot  so  as  to  give  good  results  in  the  images  under  c on si  derat i on 
The  size  constraint  helps  to  eliminate  small  reqions  that  really 
have  nc  significance  in  the  interpretation  However,  in  images 
whera  a  house  is  located  far  from  the  camera,  the  shutters  will 
appea"  -.mail  and  the  procedure  w  >.  '  l  fail  to  label  them  correctly 
Also  the  strategy  is  likely  to  confuse  doors,  windows,  arid  shutters 
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processes  have  divided  them  into  two  or  more  remoni  that  are  less 
rectangular  Improving  the  resuits  at  the  segmentation  processes, 
possibly  throuqh  a  merginq  process.  would  likely  in  eld  better 
pert  crraanc  er, 


II  5  INEERENC1NG  USING  SPAT I A I  RELATIONS 


Within  the  domain  of  house  si  imp;.,  shutters,  windows.  and 
doors  can  serve  as  landmarks  tor  locating  a  house  A  house  is  a 
structure;  its  subparts  are  object'-  that  exhibit  certain  typical 
spatial  relations  (e  .  g  window  tall  between  shutters).  (See 

LNEfBl 1  >  The  location  of  an  identified  object  (the  landmark) 


together  with  some  spatial  relai  ion  allows  the  inference  of  the 
location  of  another  ob  iect  l  GAR  A'. .1  For  example.  shutters  are 
usually  surrounded  1  he  spatial  relation)  by  house  wail  House 
wall  car,  be  identified  by  finding  a  shutter  region  (the  landmark) 
and  then  labeling  those  regions  that  surro u ml  the  shutter  'his  is 
the  same  idea  tnat  was  used  to  identify  'house-part1'  try  arms  in  th  » 


region  ■:  emp  1  u  c  1  1  ■'  '’.ample  cv.i  nl’i'i!  earlier  1  n  format  1  or.  about  the 
structure  or  objects  ...  '  •  tie  relations  t  e  tween  ob  lerts  and  object 


5  •<  b  ~  -3  r  t.  ■=•  is  current]  1;  *  -.*iit  into  v  1  r  1  0  u  s  s  *  r  a  +  >•*  q  1  •: 


(Joi't  1  v  .  n 


P  -  c  4  r  c?  3  "  t  n  d  e  v  e  1  o  p  3  c  nri"  i  "•  ten  •  •  t  rue  t  ured  d  a  t  a  b  a  s  •.  that  will 
i -'■■■?  -  provide  this  type  or  i  ni  or  (nation  to  the  strjvgies  that 


hows  the  result*  (• !  Find  ing  on  nit  era  and  then 
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labeling  neighboring  regions  as  "house-part  "  Since  the  strategy  is 
based  solely  on  spatial  relations-  shadowed  and  unshadowed  regions 
alike  are  labeled  as  "house-part",  even  though  they  differ  greatly 
in  their  spectral  attributes  (his  behavior  nan  result  in 
incorrect  labelings  when  parts  of  the  house  are  occluded.  for 
example,  a  tree  in  front  of  the  house  might  have  parts  located  in 
proximity  to  the  shutters  and  be  labeled  as  "house-part. "  Also,  the 
segmentation  processes  often  prod  me  small,  thin  horizontal  or 
vertical  regions  that  surround  the  shutter  region-.  These  are  the 
regions  that  are  located  by  the  strategy,  while  other  larger,  more 
significant  regions  are  missed. 

Expanding  the  neighbor  idea  to  include  "nearby"  regions  as 
well  as  those  that  are  strictly  adjacent  might  produce  better 
results.  Also,  the  merging  and  high  contrast  ideas  mentioned  in 
the  previous  section  are  applicable  in  this  case.  too.  Finally, 
much  work  remains  to  be  done  in  capturing  the  spatial  relations 
that  commonly  occur  between  objects  in  natural  scenes  and 
structuring  them  for  use  by  the  interpretation  strategies. 

II  6  INFERENC ING  USING  SIZE  Rll  AT IONS 

The  sizes  of  objects  tend  to  vary  a  great  dea_,  even  within  a 
single  object  class.  This  variability  makes  it  difficult  to 
characterize  an  object  class  based  solely  on  size  in  absolute 
terms  Instead  an  object  is  often  described  or  recognized  in  terms 
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of  its  size  relative  to  the  size  of  some  other  object  for 
example-  the  actual  height  of  a  person  is  often  less  important  than 
the  relationship  between  the  person's,  height  and  the  heights  of 
other  people  or  objects  in  the  environment.  Ihis  observation 
suggests  that  object  recognition  can  be  based  in  part  on  relative 
size  relations 

Given  the  ability  to  identify  some  object  unth  reasonable 
accuracy,  that  object's  size  can  be  used  to  predict  sizes  for  other 
objects  that  are  located  nearby  in  the  scone.  I  he  relation  of  the 
region  size  to  the  object  size  can  also  provide  some  information 
about  distance,  eleva.ior. ,  and  the  perspective  transformation. 

Several  tools  were  developed  to  investigate  the  use  of  size 
relations  in  image  interpretation  An  object  size  database  uiao 
built;  it  contains  the  expected  size  ranges  for  the  heights  and 
widths  of  commonly  occurring  objects  A  perspective  module  relates 
the  camera  model  and  image  regions  to  real  world  surface 
characteristics  such  as  orientation,  Tange,  elevation,  height,  and 
width  A  strategy  that  uses  both  these  tools  was  developed  The 
strategy  consisted  of  labeling  some  region  based  on  other  features 
such  as  color  and  shape  and  then  accessing  the  object  size  database 
to  find  the  expected  dimensions  for  the  object  label  assigned  to 
the  reqion  These  dimensions  were  passed  to  the  perspective  module 
which  calculated  the  range  and  elevation  of  the  object 
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The  strategy  did  not  work  well  One  problem  was  the  basic 
inability  to  label  any  region  with  (treat-  accuracy.  Another  was  the 
variability  of  expected  dimensions  stored  as  ranges  of  values  in 
the  object  size  database.  it  w-is  unclear  whether  to  use  the 
minimum  value,  the  maximum  value,  1  he  mean  value,  or  something 
else  Also,  the  perspective  module  requires  a  camera  model  and 
these  details  were  only  available  l;nr  one  imaqe.  Ihe  perspective 
module  has  never  been  extensively  tested,  so  the  validity  of  the 
■  a  lues  it  returned  were  usually  in  question  For  these  reasons, 
the  strategy  was  not  included  in  *  he  interpretation  process. 

As  further  evidence  of  the  difficulties  involved  in  using 
object  size  information  in  interpretation,  the  sizes  of  house  and 
shutter  were  compared  in  the  four  images  Two  diffeient  measures 
of  house  size  were  used  the  art-,.)  of  the  rectanqle  that  bounded 
those  regions  labeled  as  "house-part"  and  the  summed  areas  of  those 
same  regions  The  area  of  the  shutter  was  simply  the  area  of  the 
shutter  reqion  Ihe  ratios  of  house  to  shutter  are  presented  in 
Table  II  The  variability  exhibited  precludes  the  reliable  use  of 
size  relations  in  object  recognition  in  this  context  furthermore, 
problems  with  segmentation  errors  and  occlusion  make  the  recovery 
of  accurate  size  information  very  difficult 

The  processes  that  develop  the  image  segmentation  and  the 
strategies  for  object  recinnifi on  must  be  improved  before  object 
size  relations  can  be  effei lively  exploited  in  image 
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Table  II 

Ratios  of  House  Area  to  Shutter  Area 


Object  Extents 


Region  Areas 


Expected  Area  Ratio  =  139:1 

(based  on  stored  values  for  expected  heights  and  widths) 
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interpretation.  Borne  method  of-  d c term i n  1  rig  the  camera  parameters 
would  be  helpful.  Even  when  strategies  for  using  size  relations 
have  been  developed  they  most  likely  will  be  used  only  as  a  means 
of  verifying  hypotheses  formulated  by  other  strateqies 

1 1 t  COMBINING  THE  STRATEGIES:  INTERPRETATION 

The  strategies  outlined  above  rely  on  color,  size,  shape,  and 
location  features  to  identify  objerts  in  a  scene  Combining  these 
strategies  with  a  simple  blackboard  like  hypothesis  space  CERMBOJ 
and  a  scheme  for  conflict  resolution  based  on  strategy  reliability 
yields  a  fairly  powerful  imaqe  interpretation  system.  Processing 
is  serial,  control  is  hardwired,  rud  all  thresholds  and  parameters 
are  set  automatical  1  y. 

'he  interpretation  process  proceeds  as  follows  The 
segmentation  routines  produce  a  set  of  labels  that  divides  the 
image  into  regions  After  initializing  the  hypothesis  space  and  a 
few  parameters,  the  system  extracts  features  for  every  region, 
storing  the  calculat.-d  values  in  arrays  that  can  be  accessed  by 
other  procedures.  <  The  values  are  also  stored  by  region  m  the 
hypothesis  space.  Each  proc ess /strategy  invocation  adds  new 
hypotheses  to  the  space  )  Next,  spectral  attribute  matching  is 
performed  and  the  resulting  hypother.es  filtered  after  locating  the 
appro ,imate  bounds  of  sky  an!  ground.  Object  exemplars  are  chosen 


based  on  the  preceding  results  and  used  to  carry  out  region 


PACK'  ;-o 


templating  Next  a  simple  foliage  finder  locates  regions  likely  to 
represent  foliage  by  thresholding  saturation  values  The  system 
then  tries  to  locate  shutters  based  on  rec tengular i ty , 
h  e  i  g  h  t- to- w  i  d  th  ratio,  and  significant  size.  Jf  shutters  are 
found,  the  surrounding  regions  are  hypothesized  to  represi-.it  house 
wall  (or  windows).  One  of  the  surrounding  regions  is  chosen  as  art 
exemplar  of  house  wall  and  other  wall  regions  located  usinq  region 
templating.  The  roof  is  identified  using  expectations  about 
rawblue  and  saturation  values  and  size  finally,  regions  are 
grouped  by  object  type  and  conflicts  resolved  based  on  the 
reliability  of  the  processes  that  generated  the  hypotheses 
involved. 

The  results  of  applying  this  system  to  the  four  images  are 
shown  in  Figures  9-12.  Labels  have  been  compressed  into  foliage, 
house-part,  grass,  sky,  road  In  general,  the  system  performs 
well  Sky,  grass,  and  foliage  regions  are  labeled  accurately. 
Most  of  the  house  has  been  recognized  lheie  are  many  small 
mistakes:  house  shadow  is  labeled  as  foliage,  some  tree  highlight 
and  sky  regions  are  labeled  as  house,  and  so  on  borne  regions  are 
not  labeled  at  all 

What  can  be  done  to  improve  the  results'?  Many  suggestions  for 
improving  the  individual  strategies  have  been  outlined  in  the 
previous  sections.  Other  strategies  need  to  be  developed, 


especially  in  the  areas  of  space  and  size  relations. 


As  these  new 


Interpretation  results  for  Images  1-4.  Labels  i 
ord-r  of  decreasina  brightness:  sky,  house,  fol i 
grass,  road,  unknown.  (Some  labels  may  be  diff 
to  distinguish  due  to  flaws  in  reproduction.) 
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strategies  dre  inc  lucted  control  ui  1  i  become  more  important  The 
system  must  move  from  a  fixed  control  flow  to  a  tlexible  control 
architecture  that  can  decide  where  in  focus  the  system's  attention 
and  which  strategies  to  apply  UJ|.  ;t;2aJ  CWEYG2J  Finally,  many  more 
experiments  must  be  designed  and  r  m  .  in  different,  domains  and  on 
different  images.  The  result'  o :  these  experiments  will  provide 
the  best  suggestions  for  design  inn  now  strategies  and  improving 
(•nose  already  in  use 

r V  CONCLUSIONS 


experience  with  the  simple  system  described  above  and  its 
performance  on  several  images  pinvides  some  insights  into  the 
process  of  image  interpretation  i  h r  most  obvious  of  these  is  that 
a'iy  image  interpretation  system  must  incorporate  a  great  deal  of 
knowledge  This  knowledge  base  must  include  information  about  the 
entities  and  relations  that  can  end  do  occur  in  static  images  of 
outdoor  scenes,  structured  so  that  it  'an  be  efficiently  accessed 
and  updated  by  the  system  lhe  <  nmplexity  of  this  information  and 
the  structure  inherent  i  ->  the  world  of  outdoor  scenes  suggest  a 
representation  composed  of  different  levels  of  abstraction,  ranging 
from  i  .tile  edge  elements  "up"  to  mire  abstract  schemas  (structures 
that  c-  ■;  ■  ■  o  d  y  or  aggregate  kr  out  lodge  about  scenes  and  their 
con  ;  t  *  t  uer.  t  objects  and  relations:  future  research  will  help  to 
indicate  the  point  at  which  v-i.mIi  knowledge  should  move  from  the 
declarative  i e  g  object  descriptions)  to  the  procedural  (e  g. 


process*!  that  identit-g  ots  jec  t  >  i  f.f  forts  t,  o  develop  a  robust, 
corr,  l'rtir  t  representation  ar  e  cuffitnUy  underway  [WLSBc’bJ 

the  experiments  desct  ibed  m  this  paper  have  also  demonstrated 
the  utility  of  four  types  of  features — size,  shape,  color,  and 
1 o c a 1 1 on - - in  object  identification  features  of  these  types  can  be 
used  in  ci  knowledge  base  to  deswi  Ik  objects  and  in  procedures  that 
implement  generally  applicable  strategies  for  recognizing  objects 
in  scares  further  research  will  be  aimed  at  developing 

f 1 ner • g  rained  strategies  and  1 caturi"  to  be  used  in  identifying  a 
i  =T  qer  ,e  t  of  objects. 

finally,  the  workings  o+  the  simple  interpretation  system  have 
shown  that  features  and  relation'  become  most  important  after 
having  been  incorporated  within  a  variety  of  identification  and 
vi'  l Meat  ion  strategies  descriptions  at  one  do  not  constitute  an 

interpretation  system  Variety  i  e  the  key  word.  Li  l  nee  all  of  the 
strategies  are  err  or -prune,  r  e«l  umianc  u  is  required  to  a<  hieve  any 
sort  of  sue  «s  s ;  strategies  .must  compete,  cooperate,  and  interact 

Ire  strategies  presented  an-  •»  imps*  and  strengthened  by 
s  assumptions  ar.p  y  «.•  t  •  ac  h  strategy  seems  fairly  powerful  and 

r  .  o  u  s  t  turf  wo’-  i1  different  dunams  will  test  the  validity  of 

t  ■'  i  ;  .  >  i  ;  ■ 

■ ‘ r  a  t  « g  i  e s  art  control  mt.c  n  an  i  'ms  I  h  e y  correspond  roughly  to 
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the  coordinated  application  of  Idiom  ledge  sources  in  a  Hear  sag 
architecture  CENMS01.  to  meta-roles  CDAV79J  or  control  rules 
CAIK801  in  a  production  system,  to  the  processes  attached  to  frames 
CMIN75J.  While  some  commitments  have  been  made  to  incorporating 
both  bottom-up  and  top-down  processing  and  parallel  techn ,  sues  tor 
employing  alternative  models  ,  much  work  remains  to  be  done  in 
chocsing  or  developing  an  architecture  of  control  that  is  powerful 
enough  to  guide  the  interpretation  process  and  handle  such  problems 
as  focus  of  attention,  inherent  iny,  and  conflict  resolution 
CHAY7/J 
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